In [ ]:
%%HTML
<style>
.container { width:100% }
</style>
This notebook shows to do handwritten character recognition with logistic regression. I have adapted this example from an example of Aymeric Damien. He has a lot of nice notebooks discussing TensorFlow at https://github.com/aymericdamien/TensorFlow-Examples/.
In [ ]:
import gzip
import pickle
import random
import numpy as np
import matplotlib.pyplot as plt
The function $\texttt{vectorized_result}(d)$ converts the digit $d \in \{0,\cdots,9\}$ and returns a NumPy vector $\mathbf{x}$ of shape $(10, 1)$ such that $$ \mathbf{x}[i] = \left\{ \begin{array}{ll} 1 & \mbox{if $i = d$;} \\ 0 & \mbox{otherwise.} \end{array} \right. $$ This function is used to convert a digit $d$ into the expected output of a neural network that has an output unit for every digit.
In [ ]:
def vectorized_result(d):
e = np.zeros((10, ), dtype=np.float32)
e[d] = 1.0
return e
The function $\texttt{load_data}()$ returns a pair of the form $$ (\texttt{training_data}, \texttt{test_data}) $$ where
numpy.ndarray
containing the input image and $\textbf{y}$ is a 10-dimensional numpy.ndarray
numpy.ndarry
containing the input image
and $\textbf{y}$ is a 10-dimensional numpy.ndarray
corresponding to the correct digit for $\textbf{x}$.We do not use the validation data that are provided in the file mnist.pkl.gz
.
In [ ]:
def load_data():
with gzip.open('mnist.pkl.gz', 'rb') as f:
train, validate, test = pickle.load(f, encoding="latin1")
X_train = np.array([np.reshape(x, (784, )) for x in train[0]])
X_test = np.array([np.reshape(x, (784, )) for x in test [0]])
Y_train = np.array([vectorized_result(y) for y in train[1]])
Y_test = np.array([vectorized_result(y) for y in test [1]])
return (X_train, X_test, Y_train, Y_test)
In [ ]:
X_train, X_test, Y_train, Y_test = load_data()
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
The function $\texttt{show_digit}(\texttt{row}, \texttt{columns}, \texttt{offset})$ shows $\texttt{row} \cdot \texttt{columns}$ images of the training data. The first image shown is the image at index $\texttt{offset}$.
In [ ]:
def show_digits(rows, columns, offset=0):
f, axarr = plt.subplots(rows, columns)
for r in range(rows):
for c in range(columns):
i = r * columns + c + offset
image = 1 - X_train[i,:]
image = np.reshape(image, (28, 28))
axarr[r, c].imshow(image, cmap="gray")
axarr[r, c].axis('off')
plt.savefig("digits.pdf")
plt.show()
In [ ]:
show_digits(5, 12)
In [ ]:
import tensorflow as tf
In order to avoid a bug we have to set the following environment variable.
In [ ]:
%env KMP_DUPLICATE_LIB_OK=TRUE
We create placeholders to use for the data. Below, None
stands for the yet unknown number of training examples.
In [ ]:
X = tf.placeholder(tf.float32, [None, 784]) # mnist data image of shape 28*28=784
Y = tf.placeholder(tf.float32, [None, 10]) # 0-9 digits recognition => 10 classes
Next, we create variables for the weights and biases.
The variable W
is the weight matrix, while b
is the bias vector.
In [ ]:
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
We construct the model for logistic regression. Y_pred
is the prediction vector. We use the
softmax activation function. For a $d$-dimensional vector $\mathbf{z}$, this function is defined as
$$ \sigma(\mathbf{z})_i := \frac{\exp(z_i)}{\;\displaystyle\sum\limits_{j=1}^d \exp(z_j)\;} $$
This function is predifined in TensorFlow.
Here, the vector $\mathbf{z}$ is defined as
$$ \mathbf{z} = \mathbf{x} \cdot W + \mathbf{b} $$
In [ ]:
Y_pred = tf.nn.softmax(tf.matmul(X, W) + b)
We use the cross entropy as a cost function. This is defined as $$ -\sum\limits_{i=1}^d \mathtt{Y}_i \cdot \ln(\mathtt{Y\_pred}_i) $$ Here, $\mathtt{Y}_i$ is the expected outcome, while $\mathtt{Y\_pred}_i$ is the output predicted by our model.
In [ ]:
cost = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(Y_pred), reduction_indices=1))
We set some hyperparameters. We will use stochastic gradient descent with a minibatch size of $100$.
In [ ]:
learning_rate = 0.05
training_epochs = 50
batch_size = 100
num_examples = X_train.shape[0]
We use stochastic gradient descent to minimize this cost function.
In [ ]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
The function $\texttt{next_batch}(s)$ returns the next batch of the size $s$. It returns a pair
of the form $(X, Y)$ where $X$ is a matrix of shape $(s, 784)$ and $Y$ is a matrix of
shape $(s, 10)$. The function updates the global variable count
.
In [ ]:
count = 0
In [ ]:
def next_batch(size):
global count
X_batch = X_train[count:count+size,:]
Y_batch = Y_train[count:count+size,:]
count += size
return X_batch, Y_batch
In [ ]:
%%time
init = tf.global_variables_initializer()
with tf.Session() as tfs:
tfs.run(init)
for epoch in range(training_epochs):
count = 0
avg_cost = 0.0
num_batches = int(num_examples / batch_size)
# Loop over all batches
for i in range(num_batches):
X_batch, Y_batch = next_batch(batch_size)
# Run optimization op (backprop) and cost op (to get loss value)
_, c = tfs.run([optimizer, cost], {X: X_batch, Y: Y_batch})
# Compute average loss
avg_cost += c / num_batches
print("Epoch:", '%2d,' % epoch, "cost =", "{:.9f}".format(avg_cost))
print("Optimization Finished!")
# Test model
correct = tfs.run(tf.equal(tf.argmax(Y_pred, 1), tf.argmax(Y, 1)), {X: X_test, Y: Y_test})
print("Accuracy:", np.sum(correct) / len(correct))
In [ ]: